Simulate three data sets according to three models
Vary sample sizes and effect sizes
\(n \in \{25, 50, 100, 200, 400, 800\}\)
\(R^2 \in \{0.02, 0.09, 0.25\}\)
\(R^2 = \frac{\text{Var}(X\boldsymbol{\beta})}{\text{Var}(Y)}\).
Fully crossed design (18 conditions per simulation)
4 predictor variables \(X\) and an outcome \(Y\) (continuous or binary)
\(\beta_2 = 2\beta_1; \beta_3 = 3\beta_1; \beta_4 = 4\beta_1\)
\(H_1: \beta_1 < \beta_2 < \beta_3 < \beta_4\) vs \(H_u: \beta_1, \beta_2, \beta_3, \beta_4\).
\(X \sim \mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma})\), \(Y \sim \mathcal{N}(X\boldsymbol{\beta}, 1 - R^2)\) or \(Y \sim \mathcal{B}(p)\).
\[\boldsymbol{\mu} = \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \\ \end{bmatrix}, ~~~~~~ \boldsymbol{\Sigma} = \begin{bmatrix} 1 & 0.3 & 0.3 & 0.3 \\ 0.3 & 1 & 0.3 & 0.3 \\ 0.3 & 0.3 & 1 & 0.3 \\ 0.3 & 0.3 & 0.3 & 1 \\ \end{bmatrix}.\]
In simulation 2, a randomly selected study has a sample size of \(n = 25\).
Data generated with 5 predictor variables \(X\) and an outcome \(Y\).
Sim3: \(\beta_1 = \beta_2 = \beta_3; \beta_4 = 2\beta_1; \beta_5 = 3\beta_1\)
Sim4: \(X_1\), \(X_2\) and \(X_3\) collapsed into \(X_{c} = \frac{X_1 + X_2 + X_3}{3}\)
Same models, effect sizes, sample sizes, etc.
Data generated with 2 predictor variables \(X\) and an outcome \(Y\).
Sim5: \(\beta_2 = 2\beta_1\).
Sim6: \(X_2\) categorized into dummies \(D_{low}\), \(D_{med}\) and \(D_{high}\).